Feature Extraction & Image Processing,
M. S. Nixon and A. S. Aguado
Elsevier/ Butterworth Heinmann, 2nd Edition 2007
This worksheet is the companion to Chapter 2. We will start with one dimensional signal analysis, move on to sampling theory and then study processing images in the frequency domain. The worksheet follows the text directly and allows you to process basic images.
Rather than start with images (which are two dimensional - spatial - signals) we'll start with one dimensional signals. Consider for example, a signal p(t) which is a function of time t:
This signal is a continuous function of time. We can transform the signal into the frequency domain, its frequency components, using the Fourier transform which is defined as:
To obtain its frequency domain version, by substitution in the definition for the Fourier transform we get
And Mathcad's symbolic manipulation (select the whole function by clicking on the assignment operator and then choose evaluate from the symbolic menu) gives the result
We can't plot Fp(w) directly since it is undefined for w=0. To handle the singularities, we'll change Fp(w) to be
This is the sinc function, (sin(x))/x = sinc x. It suggests that a pulse is made up of a lot of low frequencies (the biggest frequency domain components are near the origin) and only few high frequency components. Try altering the width of the pulse, making it smaller and larger. Do you expect the result you see?
The Fourier transform tells us the frequency components in the original signal. The magnitude of the transform at a particular frequency indicates how much of that frequency is in the time domain signal. Let's have a look at some of the contributions. We'll look at the contribution of two of the low frequencies first, for w = 1 and w = 2
So there's more of the low frequency (w = 1) than there is of the high frequency (w = 2) as shown in the plot of the transform. Let's look at some higher frequencies, w = 3 and w = 4
So the contribution for w = 3 is less than that for w = 4. That is to be expected since there are points where the transform components are zero (reflecting no contribution). Finally, let's look at the contributions for w = 5 and w = 6
So the contribution for w = 5 exceeds that due to w = 6. Let's integrate the contributions from the frequency components to see what we get
Integrating the frequency components has taken us back to something which is very close to the original pulse. The difference is because there are more frequency components above w=6 and when we integrate them (by extending the integration limits) then we get a signal which is much closer to the original signal.
Note that the the result of the Fourier transform is actually a complex number. We often display/ interpret the Fourier transform in terms of its magnitude and phase where (for a complex number), the magnitude is given by
The magnitude tells us how much of each frequency is in the original signal, the phase tells us when. The magnitude of the transform is a pulse is similar to the previously depicted transform, but in positive form. The phase varies between 0 and 2p radians, consistent with the atan function.
The inverse Fourier transform takes us back from the transform to the time domain. It is used to reconstruct the original time domain signal from its transform components. The inverse FT is defined as
In order to develop transform theory further, we shall need the delta function. This has a value of one at a particular time, here at time t and it is zero for all other values of time.
The Fourier transform of a pulse is a sinc function. These are known as a transform pair, since the Fourier transform of a sinc function is a pulse. We can generate Fourier transforms by using Mathcad's symbolic transform operator. We shall use this to develop the transform pairs shown here. Other transform pairs include:
i) cosine wave
Time Domain Frequency Domain
The magnitude of the Fourier transform shows the frequency of the original cosine wave.
ii) a Gaussian function with variance
The Fourier transform is also a Gaussian function, illustrating linearity in the transform process. Try altering the standard deviation, s, how does this affect the width (in the time and in the frequency domains)?
The Fourier transform shows that adding up an infinite set of frequencies gives a delta function. Check this by making the width of the earlier pulse very small, what happens to its transform?
and iv) for a sequence of spikes we need a different version of our delta function which has a value of unity between a period when it is zero. The period controls the spacing of the delta functions:
The sequence of spikes is actually used to sample a signal. If we multiply a chosen signal by the series of spikes, then we will get a sequence of samples (or instantaneous values) of the signal we have sampled. Physically, these are the outputs of an Analogue to Digital converter which feeds samples of an analogue (continuous) signal into a digital computer. If we take these samples too slowly, then we won't be able to reconstruct the original signal which was sampled. If we take the samples very quickly, we'll generate lots of data to be analysed. Nyquist's sampling theorem states that:
In order for a signal to be reconstructed from its samples, it must be sampled
at at least twice the maximum signal in the original signal.
So for speech (say maximum frequency 6 kHz) we need to sample at 12 kHz. This process gives us discrete, sampled, signals as opposed to continuous ones. Accordingly, we need a Discrete Fourier Transform (DFT) to handle these sampled signals. The DFT transforms a set of samples of a function into a set of samples of frequency. So we'll need some pointers to the sample values:
where px and Fpu are the samples of a signal and the samples of frequency, respectively. For a pulse which is of amplitude A for the first 5 samples and zero for the remaining ones, the Fourier transform is:
The range of frequencies excludes zero, since this gives a singularity in the computation of Fp0. So if we redefine the range of frequencies to include zero frequency, and define Fpu accordingly, we have:
Now try altering them width of the pulse. Do you expect what you see? Is the effect the same as for continuous signals?
This is an analytic version for a simple, known, relationship. The DFT is usually used direct, without resort to symbolic evaluation, since most signals of interest do not have an explicit analytical representation. So for a sampled signal defined by:
The components of the transform suggest how we can add up samples of frequency to return to the original sampled signal. The result of adding these up is shown below.
This takes us back to the original signal. There are some differences between the reconstructed signal and its original version, implying that some information has been lost. This process is, of course, the inverse DFT which calculates a signal from its transformed version, according to:
We'll now move to two dimensional signals: we need a image to process. Let's use some vertical lines on a dark background, this gives an image:
We'll usually view this as a matrix when we want to see what happened to the numbers, and as an image when we want to see effects. Every picture tells......
As a reminder, Indices to P are row, column. So for a pixel Py,x the index y is downwards, starting at 0, x is across. We can also display it as a picture
The 2D Fourier transform decomposes a picture into its constituent spatial frequencies. The result is a matrix of the same size as the original picture. So when transforming p we will get an 8*8 picture; the size N is 8. We then need pointers which address the eight points:
We can obtain a two dimensional discrete Fourier transform, by using a standard version of the formula, as:
The latter expression uses Mathcad's vectorise operator which treats all matrix elements separately. As such, we form the modulus, or magnitude, of each individual transform pixel.
The picture of this is the magnitude spectrum. This shows that we have vertical bars only; these are horizontal spatial frequencies (since brightness varies across the picture).
We'll use the Mathcad Fourier transform (FT) function icfft to transform the image. icfft implements the earlier FT, but using a fast algorithm, called the Fast Fourier Transform (FFT). This requires a square picture 2n*2n (i.e. for 8*8, n=3).
and we obtain the transform of an image of horizontal bars by applying this function to a transposed version of the image (matrix) of vertical bars as:
We have now transformed a picture of horizontal lines. This reflects vertical spatial frequencies (since brightness varies downwards and is constant across the picture). So the resulting transform should swap round, which it does.
The inverse 2D Fourier transform should take us back to the original picture, from the transformed version . It's defined as:
The one we've transformed back is the Fourier transform of the transposed original picture. So we've got back to the right place!
This gives us our transform pair. Applying the inverse FT to the result of the FT should take us back to where we started from:
It does! So let's look at the Fourier transform of an image of a square on a dark background, with a bit of added noise. This gives a matrix:
Now we see a more complex arrangement of spatial frequencies, both vertical and horizontal ones. When we add these together, we can get back to our original image of a square
We actually need to shift the Fourier transform around a bit. This is because it is conventionally displayed with the lower frequency components in the centre, and the higher frequency ones outside. The lowest frequency component is actually called the d.c. component (which is like the d.c. in electricity, which is a voltage which doesn't change). The d.c. component measures the total energy in a picture and is proportional to the sum of the pixel values. Spreading out from the (central) d.c. component, the frequency increases. To picture a transform in this way, we can rotate each quadrant by 1800.
So the the components now appear to be centred on the middle of the picture, rather than on the corners. This is how the textbooks usually display transform data. This is only for display purposes. If you want to get back to the original picture, you just need to apply the operator again.
By using the shift property of the Fourier transform, we can specify the rearrangement process in more compact form as:
and this is the same as the previous picture, rearranged purely for visualisation, not changing the frequency domain information. The function will be used before using the Fourier transform, or after using the inverse transform. Let's have a look at a transform of a real image. We'll display it by taking the logarithm of each point: this is so that we can see the result (try taking the logarithm out to see its effect). First, we'll read the image in:
Notice how a lot of the frequency information is centred near the d.c. component. This shows how transforms can be used to code images (you just use the frequency components which carry the information, rather than the ones which have none).
One of the properties of the Fourier transform is called shift invariance. This states that magnitude of the Fourier transform of a shifted (delayed) signal is the same as that of the original (unshifted) signal. We can move a picture along the y axis (wrapping round the bits that fall off - called cyclic shift) by using a function:
These are the same, hence proving that shift invariance is true. This can be important in, say, texture analysis. We now know that if we take a small picture of a large texture, so long as the texture is spatially homogeneous, then the Fourier magnitude spectrum won't change wherever we take the picture. One thing does change though, that's the phase. Now:
So it's the phase that changes, not the magnitude. This is how phase affects how the spatial frequencies are added up. Try it for other amounts of picture shift.
If you want to show this for vertical and horizontal shifts, shifts, you'll need to move the picture by the (general) operator
The Fourier transform also rotates with the image rotation. We'll just do rotation by 90o by using matrix transformation. The transform also rotates by 90o. This is to be expected since horizontal spatial frequencies have become vertical ones and vice versa.
The last property of major interest is scaling. The scaling of frequency samples is inversely proportional to the scaling of the image samples. Naturally, this is due to the inverse relationship between time and frequency. Scaling a picture down (by division) makes its frequencies higher, as expected.
Let's now look at some other transforms, first the Discrete Cosine Transform (DCT). This is defined as:
We need to be able to get back to the picture domain, so we need the inverse DCT which is defined as:
Applying this to a real image would take a long time. There are routines, some based on the FFT, which allow for fast implementation. The major advantage of the DCT is energy compaction: transform images (via the DCT) have fewer components than those obtained by the FFT. As such the DCT is the major component of image coding systems, like JPEG and MPEG. Another transform is the Hartley transform: this uses purely real arithmetic to transform from one image to another as:
Finally, let's look at some introductory applications of the Fourier transform. The first application concerns filtering the image. We need a spatial filter, so we'll use a circle. This is centred on the d.c. component and you choose the radius to select frequency components. If we choose those components within a circle centred on the origin, we get low frequency components and delete the high frequency ones. This is called low-pass filtering. We need a radius first, then we low-pass filter the Fourier transform:
The low-pass filter actually has a blurring function, due to its exclusion of high frequency components. If you set the radius to a low value, you end up with a very blurred picture. If you set the radius to a high value, excluding none of the frequency components, you'll return to the original image. Check out different values of the radius: try (say) 2, 10 and 20.
So let's look at high-pass filtering, where the low frequency components are deleted and the high frequency ones are retained. Again, we'll use a circle so we'll need a radius and a high-pass filter:
The high-pass filter actually selects regions where the brightness changes sharply, since this gives rise to high frequency information. Accordingly, the result of the high-pass filter shows up the edges of eyes, and the corners in particular. Again, try different values for the radius and see (and explain) what you get.
Note that the rearrangement process is only used to visualise the transforms. Implementation code does not need to use the rearrangement process. As an exercise, what form would the low-pass and high-pass filters take if the rearrangement process was not used?
That's the end of images, sampling and frequency domain processing. There's a lot more material that could be included (there are many more transforms, for example), but isn't since it is more in the realms of image processing as opposed to computer vision. We will however continue to use Fourier theory, sometime because it helps us to understand techniques, sometimes because it allows us fast implementation and sometimes because we need spatial frequency concepts. So don't forget it all!